Unsupervised learning of multi-word verbs∗

نویسندگان

  • Don Blaheta
  • Mark Johnson
چکیده

Collocation is a linguistic phenomenon that is difficult to define and harder to explain; it has been largely overlooked in the field of computational linguistics due to its difficulty. Although standard techniques exist for finding collocations, they tend to be rather noisy and suffer from sparse data problems. In this paper, we demonstrate that by utilising parsed input to concentrate on one very specific type of collocation—in this case, verbs with particles, a subset of the socalled “multi-word” verbs—and applying an algorithm to promote those collocations in which we have more confidence, the problems with statistically learning collocations can be overcome.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Unsupervised Verb Class Disambiguation

We present an unsupervised learning method for disambiguating verbs that belong to more than one Levin verb class (1993) when occurring in a particular syntactic frame. We used examples that contain unambiguous verbs in each verb class as the training data for ambiguous verbs in that class. A Naive Bayesian classifier was employed for the disambiguation task using context words as features. Our...

متن کامل

Unsupervised Verb Inference from Nouns Crossing Root Boundary

Inference about whether a word in one text has similar meaning to another word in the other text is an essential task in order to understand whether two texts have similar meaning. However, this inference becomes difficult especially when two words do not share a lexical root, do not have the same argument structure, or do not have the same part-of-speech. This paper presents an unsupervised ap...

متن کامل

Distributional Semantics Approach to Thai Word Sense Disambiguation

Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy...

متن کامل

Analysis of functional similarities of Finnish verbs using the self-organizing map

Obtaining semantic or functional word categories from data in an unsupervised manner is a problem motivated both from the linguistic point of view and from that of construing language models for various language processing tasks. In this work, we use the Self-Organizing Map algorithm to visualize and cluster common Finnish verbs based on their immediate morphological contexts. Based on a data s...

متن کامل

SEMANTIC CLUSTERING OF VERBS Analysis of Morphosyntactic Contexts Using the SOM Algorithm

Obtaining semantic or functional word categories from data in an unsupervised manner is a problem motivated both from the linguistic point of view and from that of construing language models for various language processing tasks. In this work, we use the self-organizing map algorithm to visualize and cluster common Finnish verbs based on functional and semantic information coded by case marking...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001